Multilingual corpora with coreferential annotation of person entities

نویسندگان

  • Marcos García
  • Pablo Gamallo
چکیده

This paper presents three corpora with coreferential annotation of person entities for Portuguese, Galician and Spanish. They contain coreference links between several types of pronouns (including elliptical, possessive, indefinite, demonstrative, relative and personal clitic and non-clitic pronouns) and nominal phrases (including proper nouns). Some statistics have been computed, showing distributional aspects of coreference both in journalistic and in encyclopedic texts. Furthermore, the paper shows the importance of coreference resolution for a task such as Information Extraction, by evaluating the output of an Open Information Extraction system on the annotated corpora. The corpora are freely distributed in two formats: (i) the SemEval-2010 and (ii) the brat rapid annotation tool, so they can be enlarged and improved collaboratively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic anaphora resolution: corpora annotation with coreferential links

Annotated resources are much needed for evaluation and training of anaphora resolution systems. The coreferential chain annotation is a difficult task which can not be realised without an appropriate tool. In this paper, we present our work on Arabic corpora annotation with anaphoric links (i.e., the annotation of the identity relation between the anaphors and their antecedents). In particular,...

متن کامل

Multilingual Supervision of Semantic Annotation

In this paper, we investigate the annotation projection of semantic units in a practical setting. Previous approaches have focused on using parallel corpora for semantic transfer. We evaluate an alternative approach using loosely parallel corpora that does not require the corpora to be exact translations of each other. We developed a method that transfers semantic annotations from one language ...

متن کامل

Predicative NPs and the annotation of reference chains

In the development of machine learning systems for identification of reference chains, hand-annotated corpora play a crucial role. This paper concerns the question of how predicative NPs should be annotated w.r.t. coreference in corpora for such systems. This question highlights the tension that sometimes appears in the development of corpora between linguistic considerations and the aim for pe...

متن کامل

Shared Resources for Multilingual Information Extraction and Challenges in Named Entity Annotation

Progress in natural language processing requires increasing amounts of data and annotation in a growing variety of languages, and research in named entity extraction is no exception. While the value of richlyannotated, large-scale multilingual corpora is undeniable, costs for producing such data are high, underscoring the value of shared resources. As part of the US Governmentsponsored Automati...

متن کامل

CLinkA A Coreferential Links Annotator

The annotation of coreferential chains in a text is a difficult task, which requires a lot of concentration. Given its complexity, without an appropriate tool it is very difficult to produce high quality coreferentially annotated corpora. In this paper we discus the requirements for developing a tool for helping the human annotator in this task. The annotation scheme used by our program is deri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014